Tautomerism in large databases

نویسندگان

  • Markus Sitzmann
  • Wolf-Dietrich Ihlenfeldt
  • Marc C. Nicklaus
چکیده

We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS's tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tautomerism in computer-aided drug design.

Tautomers are often disregarded in computer-aided molecular modeling applications. Little is known about the different tautomeric states of a molecule and they are rarely registered in chemical databases. Tautomeric forms of a molecule differ in shape, functional groups, surface, and hydrogen-bonding pattern. Calculation of physical-chemical properties and molecular descriptors differ from one ...

متن کامل

Enumeration of Ring–Chain Tautomers Based on SMIRKS Rules

A compound exhibits (prototropic) tautomerism if it can be represented by two or more structures that are related by a formal intramolecular movement of a hydrogen atom from one heavy atom position to another. When the movement of the proton is accompanied by the opening or closing of a ring it is called ring-chain tautomerism. This type of tautomerism is well observed in carbohydrates, but it ...

متن کامل

Tautomerism in structure-based 3D pharmacophore modeling

Tautomeric rearrangements on molecules lead to distinct equilibrated structural states of the same chemical compound, and evidently, have an impact on nearly all aspects of computer-aided chemical data processing [1] where the knowledge of the exact chemical structure is required (e.g. the calculation of chemical properties or interpretation of ligand-protein interactions). Although tautomerism...

متن کامل

Thermodynamic Study and Total Energy Calculation for three systems of Enol↔Keto Tautomerism

Using Hartree–Fock (HF) and ِِDensity Functional Theory (DFT) calculations the thermodynamic properties such as thermal energy , , thermal enthalpy , , thermal entropy , , thermal Gibbs free energy , , heat capacity ,Cv, and molecular structures of several species involving in keto↔enol tautomerism related to acetaldehyde (A), 5,5-dimethyl-1,3-cyclohexanedione (dimedone) and  acetylacetone (AA) h...

متن کامل

Using Implicit/Explicit Salvation Models to Theoretical Study Tautomerism in 7H-purine-2, 6-diamine

A theoretical study at the B3LYP/6-31++G(d,p) level was performed on the tatumerization of 7H-purine-2, 6-diamine into 9H-purine-2, 6-diamine. Such a tautomerism can take place via three different pathways namely A, B, and C. The energetic results associated with the gas phase reveal that pathways A, B, and C display a very high activation Gibbs free energy of 45.1, 68.6 and 48.9 kcal/mol, resp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2010